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Java Speech Markup Language Specification 



1.0 Introduction 

A speech synthesizer provides a computer with the ability to speak. Users and applications provide text 
to a speech synthesizer, which is then converted to audio. • 



Figure 1 : Text from an application is converted to audio output 

Speech synthesizers are developed to produce natural-sounding speech output. However, natural human 
speech is a complex process, and the ability of speech synthesizers to mimic human speech is limited in 
many ways. For example, speech synthesizers do not "understand" what they say, so they do not always 
use the right style or phrasing and do not provide the same nuances as people. 

The Java™ Speech Markup Language (JSML) allows applications to annotate text with additional 
information that can improve the quality and naturalness of synthesized speech. JSML documents can 
include structural information about paragraphs and sentences. JSML allows control of the production 
of synthesized speech, including the pronunciation of words and phrases, the emphasis of words 
(stressing or accenting), the placements of boundaries and pauses, and the control of pitch and speaking 
rate. Finally, JSML allows markers to be embedded in text and allows synthesizer-specific controls. 

For the example in Figure 1 . we might use JSML tags to indicate the start and end of the sentence and to 
emphasize the word "can": 

<SENT>Computers <EMP>can</EMP> speak . </SENT> 

1.1 Role of JSML 

JSML has been developed to support as many types of applications as possible, and to support text 
markup in many different languages. To make this possible, JSML marks general information about the 
text and, whenever possible, uses cross-language properties. 

Although JSML may be used for text in Japanese, Spanish, Tamil, Thai, English, and nearly all modern 
languages, a single JSML document should contain text for only a single language. Applications are 
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therefore responsible for management and control of speech synthesizers if output of multiple languages 
is required. 

JSML can be used by a wide range of applications to speak text from equally varied sources, including 
email, database information, web pages, and word processor documents. Figure 2 illustrates the basic 
steps in this process. 



Figure 2 : JSML Process 

The application is responsible for converting the source information to JSML text using any special 
knowledge it has about the content and format of the source information. For example, an email 
application can provide the ability to read email messages aloud by converting messages to JSML. This 
could involve the conversion of email header information (sender, subject, date, etc.) to a speakable 
form and might also involve special processing of text in the body of the message (for handling 
attachments, indented text, special abbreviations, etc.) Here is a sample of an email message converted 
to JSML: 

<PARA>Message from <EMP>Alan Schwarz</EMP> about new synthesis technology. 
Arrived at <SAYAS CLASS="time">2pm</SAYAS> today . </PARA> 

<PARA>I , ve attached a diagram showing the new way we do speech 
synthesis . </PARA> 

<PARA>Regards, Alan. </PARA> 

Similarly, a web browser could provide the ability to speak web pages by converting them to JSML. 
This process would involve conversion from HTML (HyperText Markup Language), the basic format of 
the web, to JSML. Readers may notice that JSML and HTML have a similar form. This similarity is 
because the formats share a common ancestor: Standard Generalized Markup Language (SGML). 
However, their roles are different. HTML is specialized for visual display of information, whereas 
JSML is for speaking information. 



2.0 Markup in JSML 

2.1 Basic Markup 

The special text in the following example is the text markup. 

<SENT>Computers <EMP>can</EMP> speak . </SENT> 

This style will be familiar to you if you have used HTML, SGML, or XML. <sent> indicates the start of 
a sentence element and </sent> ends that sentence. Similarly, <emp> and </emp> mark a region to be 
emphasized. 

sent and emp are referred to as elements. JSML defines eight elements. The following sections describe 
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elements and other JSML markup in more detail. 
2.2 Container Elements 

JSML elements are either container elements or empty elements. A container element has a balanced 
start tag and end tag (e.g., <sent> and </sent>). The text appearing between the start and end tags is 
the contained text as shown in Figure 3 . An element's start-tag defines the type of element and may 
contain one or more attributes. All end-tags have the same name as their matching start-tag. 



a 



Figure 3 : Elements and Attributes 

2.2.1 Attributes 

Attributes are used to provide additional information about an element. Each JSML element has a set of 
defined attribute names and, in some cases, the attribute value is restricted to certain strings. For 
example, an emp element can mark words with a level attribute value of strong: 

Ich bin ein <EMP LEVEL="strong">Berliner ! </EMP> 

2.2.2 Element Nesting 

Some JSML elements allow the contained text to contain other elements. This is referred to as nesting. 

<PARA> text with <EMP> more text </EMP> </PARA> 

Nested elements cannot overlap or intertwine. For example, the following is not legal: 

<PARA> text with <EMP> more text </PARA> </EMP> 

2.3 Empty Elements 

An empty element has only one tag and does not contain any text. For example, the following results in 
a large break/pause in the speech at the point that the element occurs: 

A loud noise was heard, <BREAK SIZE="large"/>and the room became quiet. 

Because it doesn't mark any text, an empty element like break doesn't need an end-tag. Rather, the "/>" 
marks the end of the start-tag and of the element. Like the container elements, empty elements can 
include attributes to provide additional information (for example, SIZE="large" above). 

2.4 Names 

All JSML element and attribute names are uppercase. All JSML attribute values are case sensitive. 
Furthermore, the naming of elements and attributes and the values of attributes are independent. 
Consequently, it is possible for an element to have an attribute of the same name (though none currently 
do). 
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2.5 White Space 

Within an element's start- and end-tags, single white space characters can optionally be replaced by 
multiple white space characters without changing the semantics of the element. 

White space contained between an element's start- and end-tags, or not contained by any element, is 
passed to the speech synthesizer and may affect speech output. 

2.6 Undefined Names 

Elements or attributes with undefined names are ignored by the speech synthesizer. This feature is useful 
in automatic generation and processing of JSML. For example, a web browser could generate the 
following: 

<URL ORIG="http: //acme . com" > URL is ACME dot com</URL> 

In this example, the orig attribute is used to preserve the original URL. The contained text will be 
spoken by the speech synthesizer but the URL element tags will be ignored, because they are not defined 
in JSML and therefore not known to the synthesizer. 

This mechanism does allow speech synthesizers to extend the JSML element set by interpreting these 
additional elements specially. However, application developers should be aware that elements not 
specified in JSML are not portable across synthesizers and platforms. 

2.7 JSML Document Structure 

JSML is a subset of XML- (Extensible Markup Language), which is a simple dialect of SGML. By 
being a subset of XML, JSML gains a standardized, extensible syntax that is not tied to the Java Speech 
API (JSAPI). This means that: 

• JSML is readable and editable by both humans and computers. 

• General XML editors can be used to simplify writing JSML. 

• JSML markup is very regular and easy for a synthesizer to parse. 

• Text containing JSML can be prepared by hand using non-JSAPI-specific editors. 

Although it is not necessary to know about XML to understand JSML or to use JSML, the following 
may be of interest. If JSML text starts with: 

<?XML version="l."0" encoding="UCS-2"?> 
<JSML> 

and ends with: 

</JSML> 

then the JSML is a well-formed XML document. This means that a speech synthesizer can use a generic 
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XML parser on JSML text. If a synthesizer supplies the parser with a DTD (Document Type Definition) 
for JSML, then the synthesizer's work is significantly reduced. These openning and closing elements are 
optional in JSML documents. 

Having a DTD allows the application to use the full power of XML for generating text, for example, 
entity references that can act as a shorthand for repetitive JSML, and then to use generic text processing 
tools for generating the JSML. 

2.7.1 Splitting JSML Documents 

A JSML document must be syntactically complete. Every start tag must be an empty element (no end 
tag required) or have a matching end tag. If text is split into multiple JSML documents to be spoken in 
sequence, then the text should be split between paragraphs or perhaps between sentences. This is 
because each document will be spoken independently and important phrasing and pitch information will 
be affected by inappropriate boundaries. 

2.8 Escaping/Quoting Text 

If text to be spoken contains a less-than sign ("<", which is \u003C) or an ampersand ("&", which is 
\u002 6) 5 then the text needs to be escaped or quoted to prevent the possibility of some of the text being 
mistaken for JSML tags. There are several methods available: 

• Individual less-than signs may be replaced with one of the following character sequences (without 
the quotes): "< M , "<", or "<". 

• Individual ampersands may be replaced with one of the following character sequences (without 

the quotes): "&", "&", or "&". 

• A cdata section can be placed around the entire text. 

A cdata section has the general form of: 

< ! [CDATA [the text that is being escaped] ] > 

The text that is being escaped can contain any character sequence that is not the "]]>" sequence. 
A cdata section can be used on text that is contained by an element, for example: 

<EMP>Joe Doe < ! CDATA [<j oe . doe@acme . com>] ] ></EMP> 

and on text that is not contained by an element, for example: 

<![CDATA[X < Y is a boolean expression .]] > 

Synthesizers handle cdata sections by stripping away the < ! [cdata[ and ] ] > markup and not parsing 
the cdata section's contents for JSML. 

2.9 Comments 

A JSML comment begins with a < ! — character sequence and ends with a — > character sequence and 
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may contain any text except the two-character sequence --. 

Comments can be placed within text that is to be spoken (the comments will not be spoken). 

How now brown <!-- This is an example comment --> cow. 
Comments may not be placed within elements. 



3.0 JSML Elements 

JSML syntax consists of structural, production, and miscellaneous elements. The following table 
presents an overview of JSML's elements. These elements are defined in detail in the following sections. 
The section on structural elements also describes implicit paragraph marking, which is an alternative to 
the para element. 



Element 
Function 


Element 
Name 


Element 
Type 


Element 
Description 


Structure 


PARA 


Container 


Specifies that the contained text is a paragraph. 


SENT 


Container 


Specifies that the contained text is a sentence. 


Production 


SAYAS 


Container 


Specifies how to say the contained text. 


EMP 


Both 


Specifies emphasis for the contained text or 
immediately following text. 


BREAK 


Empty 


Specifies a break in the speech. 


PROS 


Container 


Specifies a prosodic property, such as baseline pitch, 
rate, or volume, for the contained text. 


Miscellaneous 


MARKER 


Empty 


Requests a notification when speech reaches the marker. 


ENGINE 


Container 


Native instructions to a specified speech synthesizer. 



4.0 Structural Elements 
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4.1 PARA 



PARA 


Container element that marks the contained text as a paragraph. 




MARK 


Optional attribute that requests a notification when the synthesizers production of 
audio reaches this element's contained text. Its value is the text to be made available 
when the notification occurs. 



The para element declares a range of text to be a paragraph. For example: 

<PARA>This a short paragraph . </PARA><PARA>The subject has changed, so this is 
a new paragraph . </PARA> 

para elements do not contain other para elements; that is, para elements do not embed or nest. For 
example, the following is not legal: 

<PARA>The raven spoke. 

<PARA>I , ve come from Norway at the command of the king. He sues for 
peace . </PARA> 

</PARA> 

4.2 Implicit Paragraph Marking 

In JSML, a blank line (that is, a line that contains only whitespace characters) that separates one block of 
text from another is treated the same as explicitly marking the block as a paragraph. Strictly speaking, a 
blank line is not an element, however, it does serve the same function as the para element. 

The following fragments result in the same speech: 

She went to school and passed the tests. 

When she returned to her bicycle, the sun had set. 

and 

<PARA>She went to school and passed the tests . </PARA> <PARA>When she returned 
home, the sun had set.</PARA> 

and 

<PARA>She went to school and passed the tests . </PARA> 
<PARA>When she returned home, the sun had set.</PARA> 

and 

<PARA>She went to school and passed the tests. 
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When she returned home, the sun had set.</PARA> 

A blank line can be created by any of the following or by inserting white space (that is, any combination 
of spaces, \u0020, horizontal tabulations, \u0009, and ideographic spaces, \u3000) in any of the 
following: 

• Consecutive carriage return and line feed pairs (that is, \uO00d\uO00a\u000d\u0O0a) 

• Consecutive line feeds/newlines (that is, \u000A \u000a) 

• Consecutive Unicode line separators (that is, \u2028 \u2028) 



• A single Unicode paragraph separator (that is, \u2029) 
4.3 SENT 



SENT 


Container element that marks the contained text as a sentence. 




MARK 


Optional attribute that requests a notification when the synthesizer's production of 
audio reaches this element's contained text. Its value is the text to be made available 
when the notification occurs. 



The sent element declares a range of text to be a sentence. For example: 

<SENT>C , est la vie.</SENT> 

sent elements do not contain other sent elements, that is, sent elements do not embed or nest. For 
example, the following is not legal: 

<SENT>He said, <SENT>"I leave tomorrow . "</SENTx/SENT> 



5.0 Production Elements 



5.1 SAYAS 



SAYAS 



Container element that says how to pronounce a word or short phrase. One of the 
SUB, CLASS, or PHON attributes is required. 



SUB 



Optional attribute having a value of the text that is to be spoken as a substitute for the 
contained text. 



CLASS 



Optional attribute indicating how to pronounce the contained text. Values: date, digits, 
literal, number, time. 



PHON 



Optional attribute having a value of a string of IP A (International Phonetic Alphabet) 
characters or the Java \uXXXX representation of the Unicode IP A characters that are 
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to be spoken instead of the contained text. 




Optional attribute that requests a notification when the synthesizer's production of 
audio reaches this element's contained text. Its value is the text to be made available 
when the notification occurs. 



It is frequently difficult for a synthesizer to determine how to pronounce abbreviations, acronyms, 
proper names (particularly those originating in a language that is different from that of the synthesizer), 
domain-specific jargon, and homographs-. If an application has information that resolves a difficulty, it 
can provide that information to the synthesizer by using the sayas element. 

5.1.1 SUB (Substitute) 

The sub attribute defines substitute text to be spoken instead of the contained text. For example: 

<SAYAS SUB="I triple E">IEEE</SAYAS> 

5.1.2 CLASS 

When the class attribute value is date, the contained text should be pronounced as a date. For example: 

<SAYAS CLASS="date">Jan. 1952</SAYAS> 

<! — spoken as January nineteen fifty-two — > 

Note that simply stating that something is a date does not always yield the desired pronunciation. A sub 
attribute may be required. For example, 4/3/97 is ambiguous in: 

<SAYAS CLASS="date">4/3/97</SAYAS> 

It might be spoken as "April third nineteen ninety- seven" or as "March fourth nineteen ninety-seven." It 
is unambiguous if a sub attribute is used: 

<SAYAS SUB="March fourth nineteen ninety-seven">4 /3/97 
</ SAYAS > 

When the class attribute value is literal, the letters, digits, and other characters of the contained text 
should be spoken individually. In English, this is effectively doing spelling. This is useful for speaking 
many acronyms and for speaking numbers as digits. For example: 

<SAYAS CLASS="literal">JSML</SAYAS> 
<! -spoken as J S M L --> 

<SAYAS CLASS="literal">12</SAYAS><! --spoken as one two— > 

<SAYAS CLASS="literal">100%</SAYAS> <! — might be spoken as one zero zero 
percent sign--> 

When the class attribute value is number, the contained text should be pronounced as a number. For 
example: 
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<SAYAS CLASS= H number M >12</SAYAS> <! — spoken as twelve — > 



5.1.3 PHON (Phonetic Pronunciation) 

The phon attribute uses the International Phonetic Alphabet (IPA) character subset of Unicode to define 
a sequence of sounds. IPA characters are represented by codes from \u0250 to \u02af, by modifiers 
from \u02B0 to \u02ff, by diacritics from \u0300 to \u036F, and by certain Latin, Greek and symbol 
characters from the range \u0000 to\u0i7F. Details of the Unicode IPA support are provided in The 
Unicode Standard, Version 2.0 (The Unicode Consortium, Addison- Wesley Developers Press, 1996). 



The following examples are equivalent: 



<SAYAS PHON=" 



a 



"> phonetics </SAYAS> 



<SAYAS PHON="\u0066\u006F\u028A\u006E\u025B 
\u0074\u026A\u006B\u0073"> phonetics </SAYAS> 



Note that sounds from outside the language of the synthesizer may not be reproduced accurately. 
5.1.4 Nesting 



Elements cannot be nested within the contents of a sayas. 



Legal example: 



<PROS RATE="-30%"XSAYAS SUB="sun dot com">sun . com 
</SAYAS></PROS> 



Illegal example: 



<SAYAS SUB="sun dot com"><PROS RATE="-30% ">sun . com 
</PROS></SAYAS> 



5.2 EMP 



EMP 


Element that specifies a level of emphasis for the contained text (if used as a container 
element) or the following word (if used as an empty element). LEVEL-'moderate" is 
the default attribute. 




LEVEL 


Required attribute that indicates the level of emphasis. Values: strong, moderate, 
none, or reduced. LEVEL- ' moderate" is the default attribute. 


MARK 


Optional attribute that requests a notification when the synthesizer's production of 
audio reaches this element's contained text. Its value is the text to be made available 
when the notification occurs. 



The emp element specifies that a range of text should be spoken with emphasis. The level attribute's 
values are strong (for strong emphasis), moderate (for some emphasis), none (for no emphasis), and 
reduced (for a reduction in emphasis). 
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For example: 

Clap your <EMP>hands . </EMP> 

Clap your <EMP LEVEL="moderate">hands . </EMP> 

The emp element can also be an empty element, where it specifies that the immediately following text- is 
to be emphasized. 

The following examples have the same effect as above: 

Clap your <EMP/>hands. 

Clap your <EMP LEVEL="moderate" MARK= "hands "/> hands. 



5.3 BREAK 



BREAK 


Empty element that marks a break in the speech. SIZE- 'medium" is the default 
attribute if neither SIZE or MSECS is provided. 




MSECS 


Optional attribute having a value of an integral number of milliseconds. 


SIZE 


Optional attribute having one of the following relative values: none, small, medium, 
or large. 


MARK 


Optional attribute that requests a notification when the synthesizer's production of 
audio reaches this element's contained text. Its value is the text to be made available 
when the notification occurs. 



The break element is an empty element that is used to mark phrasing boundaries in the speech output. 
To indicate what type of break is desired, the element can include a size attribute or a msecs attribute, 
but not both. A size attribute indicates a break that is relative to the characteristics of the current 
speech, and a msecs attribute indicates a pause for an absolute amount of time. 

Where possible, the break should be defined by a size rather than a msecs, because, in most languages, 
breaks are produced by special movements in pitch, by timing changes, and often with a pause. Those 
factors are significantly affected by speaking context. For example, a 300 millisecond break in fast 
speech sounds more significant than it does in slow speech. 

Examples: 

<BREAK/> 

<BREAK SIZE="small" MARK="14 5'7> 
< BREAK MSECS = "300'7> 

5.4 PROS 
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PROS 


Element that specifies prosodic information for the contained text. At least one of the 
RATE, PITCH, RANGE, and VOL attributes is required. 




RATE 


Optional numeric attribute that sets the speaking rate in words per minute. See the text 
following this table for the type of values allowed. 




VOL 


Optional numeric attribute that sets the output volume on a scale of 0.0 to 1 .0 where 
0.0 is silence and 1.0 is maximum loudness. See the text following this table for the 
type of values allowed. 




PITCH 


Optional numeric attribute that sets the baseline pitch in Hertz. See the text following 
this table for the type of values allowed. 




RANGE 


Optional numeric attribute that sets the pitch range in Hertz. See the text following 
this table for the type of values allowed. 




MARK 


Optional attribute that requests a notification when the synthesizer's production of 
audio reaches this element's contained text. Its value is the text to be made available 
when the notification occurs. 



The pros element provides prosody control for JSML. Prosody is a collection of features of speech that 
includes its timing, intonation and phrasing. Proper control of prosody can improve the 
understandability and naturalness of speech. They are better viewed as being "hints" to the synthesizer. 
Most of the attributes of the pros tag accept numeric values. These values are floating point numbers of 
the form 23, 10.8, or -0.55. 

The rate attribute is defined in words per minute and can have values of the following forms: 



n 


Sets the speaking rate to n 


+n 


Increases the speaking rate by n 


-n 


Decreases the speaking rate by n 


+n% 


Increases the speaking rate by n percent 


-n% 


Decreases the speaking rate by n percent 


reset 


Sets the speaking rate to the default 



For example, 

<PROS RATE="150">text at 150 words per minute</PROS> 
The vol attribute can have values of the following forms: 
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n 


Sets the volume to n (between 0.0 and 1.0 inclusive) 


+n 


Increases the volume by n (to a final value no larger than 1 .0) 


-n 


Decreases the volume by n (to a final value no smaller than 0.0) 


+n% 


Increases the volume by n percent 


-n% 


Decreases the volume by n percent 


reset 


Sets the volume to the default 



The pitch and range attributes can have values of the following forms: 



n 


Sets the baseline pitch or pitch range to n Hertz 


+n 


Increases the baseline pitch or pitch range by n Hertz 


-n 


Decreases the baseline pitch or pitch range by n Hertz 


+n% 


Increases the baseline pitch or pitch range by n percent 


-n% 


Decreases the baseline pitch or pitch range by n percent 


reset 


Sets the baseline pitch or pitch range to the default 



Musically-inclined developers might think of pitch in semitones and octaves. A semitone rise in pitch is 
approximately +5.9% and a semitone drop is -5.6%. A two-semitone shift is +12.2% or -10.9%. A one- 
octave shift (12 semitones) is 100% or -50%, that is, doubling or halving pitch.- 

While speaking a sentence, pitch moves up and down in natural speech to convey extra information 
about what is being said. The baseline pitch represents the normal minimum pitch of a sentence. The 
pitch range represents the amount of variation in pitch above the baseline. Setting the baseline pitch and 
pitch range can affect whether speech sounds monotonous (small range) or dynamic (large range). 



a 



Figure 4 : Baseline Pitch and Pitch Range 

Normal baseline pitch for a female voice is between 140Hz and 280Hz, with a pitch range of 80Hz or 
more. Male voices are typically lower: baseline of 70- 140Hz, with a range of 40-80Hz. 

Note that in all cases, relative values increase the portability of JSML across speaking voices and 
synthesizers. Relative settings allow users to apply the same JSML to different voices (e.g., male and 
female voices with very different pitch ranges) and to set a local preference for speaking rate. For 
example, some users set the speaking rate very high (300 words per minute or faster) so they can listen 
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to a lot of text very quickly. 
Example: 

The <EMP/>ACME Trading Corporation, <PROS RANGE="-30%">which supplies cartoon 
goods, </PROS> was purchased yesterday for <PROS RATE="-20%" V0L="+15%"> 
$2,060,000 </PROS> by <EMP> Road Runner </EMP> Incorporated. 



6.0 Other Elements 

6.1 MARKER 



MARKER 


Empty element that requests a notification when the synthesizer's production of audio 
reaches the marker. The MARK attribute is required. 




MARK 


Required attribute having a value of the text to be made available when a marker event 
occurs. 



The marker element requests a notification from the speech synthesizer to the application when the 
mark is reached during the synthesizer's production of audio for the text. 

Example: 

Answer < MARKER MARK="yes_no_prompt "/> yes or no. 



6.2 ENGINE 



ENGINE 


Container element that provides information from the required DATA attribute to the 
synthesizer identified by the required ENGID attribute 




ENGID 


Identifier for a speech synthesizer or a comma-separated set of speech synthesizer 
identifiers. 


DATA 


Required attribute having a value of the information for the synthesizer. 


MARK 


Optional attribute that requests a notification when the synthesizer's production of 
audio reaches this element's contained text. Its value is the text to be made available 
when the notification occurs. 



This engine element allows applications to utilize a synthesizer's special capabilities. The element 
provides information, the value of the data attribute, to any speech synthesizers that are identified by 
the engid attribute. The information is generally a command in an engine-specific syntax. 

engine is a container element that is treated specially by a speech synthesizer that matches any engine 
specified in the engid. A matching engine should substitute the data for the text contained within the 
element. Other engines should ignore the data and instead process the contained text. For example, 
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given the code 

I am <ENGINE ENGID="Acme Voice" DATA="Mr. Acme"> someone else</ENGINE> 

an Acme voice synthesizer will say "I am Mr. Acme" and all other speech synthesizers will say "I am 
someone else." A JSML document can contain ENGINE elements for any number of synthesizers. 
These elements can be nested. 

Less-than signs ("<") or ampersands ("&") in a DATA attribute must be escaped to avoid being 
mistaken for JSML (see Escaping/Quoting Text) , 

For example; 

<ENGINE ENGID="Croaker 1.0" DATA="&lt ; ribbit=l>" MARK="f rog start "> no frog 
sound </ENGINE> 



World Wide Web Consortium Working Draft Extensible Markup Language Version 1.0 (August 7, 
1 997) at http://www.w3 .org/TR/WD-xml-lang 

2 Words with the same spelling but different pronunciations. For example, "I will read it." and "I have 
read it." 

3 The meaning of "immediately following text" is language dependent. English speech synthesizers will 
emphasize the next word. 



4 Percentages for 1 to 12 semitone pitch rises are +5.9%, +12.2%, +18.9%, +26.0%, +33.5%, +41.4%, 
+50%, +58.7, +68.2%, +78.2%, +88.8%, +100%. 

Decreases are -5.6%, -10.9%, -15.9%, -20.6%, -25.1%, -29.3%, -33.3%, -37.0%, -40.5%, -43.9%, - 
47.0%, -50.0%. 
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